Development of the Hungarian WordNet Ontology and its Application to Information Extraction
نویسنده
چکیده
This paper presents an outline of the construction process of the Hungarian WordNet Ontology, and the description of an information extraction application utilizing the ontology. and MorphoLogic) in a 3-year project funded by the European Union ECOP program (GVOP-AKF-2004-3.1.1.) The Princeton WordNet (WN) linguistic ontology ([1]) has become a standard and an invaluable semantic resource within the natural language technology community. The WN lexical semantic network consist of nodes called synsets (sets of synonymous words that are interchangeable in a context) corresponding to linguistic concepts and connecting edges corresponding to semantic relationships like hypernymy (is-a relationship), meronymy (part-of relationship), antonymy etc. The EuroWordNet (EWN) project ([5]) extended the WN architecture to a multilingual level, with the synsets of the English WN serving as interlingua (ILI) among the concepts of the various other languages. A common starting set (Common Base Concepts) was implemented in each participating language and then was expanded individually in a top-down manner by each partner. In addition to the 11 European languages covered by EWN, the BalkaNet project ([4]) several years later introduced connected Wordnets for 5 more Southeast-European languages. The Hungarian WordNet (HuWN) project follows the BalkaNet project's resources: Princeton WordNet 2.0 as ILI, 8500 base concept synsets as a starting point and the VisDic XML-based ontology/dictionary editor. We also decided to integrate existing semantic resources into HuWN: on the one hand, we tried to map each Hungarian synset to a sense in the EKSz Hungarian explanatory dictionary, in order to obtain definitions, and on the other hand, for each verbal synset we registered corresponding entries in our existing verb frame description lexicon. For nouns, adjectives and adverbs, the work followed the so-called expansion approach, which means we took English WordNet as a starting point for our concept networks. We used several machine-translation heuristics ([2]) to obtain a rough translation of the English synsets, which were then all manually examined, corrected and extended as necessary, with adaptation to the semantic conditions of the Hungarian language. For verbs, this approach proved to be unsustainable because of the major differences between the English and Hungarian verb typing systems. In this case, we used a mixed approach, by translating only a subset of the common concepts and creating the rest from scratch from frequent items in corpora. We also added new semantic relations and the so-called nucleus structure ([3]) in order to represent aspects of verb meanings unique to …
منابع مشابه
Methods and Results of the Hungarian WordNet Project1
This paper presents a complete outline of the results of the Hungarian WordNet (HuWN) project: the construction process of the general vocabulary Hungarian WordNet ontology, its validation and evaluation, the construction of a domain ontology of financial terms built on top of the general ontology, and two practical applications demonstrating the utilization of the ontology.
متن کاملDomain Specific WordNet on Customs Law
The NLP research group at the University of Szeged took part in the development of the Hungarian WordNet between 2005 and 2007. In 2008, they developed a smaller, domain specific WordNet on customs law. This knowledge base contains about 650 concepts cautiously selected by legal experts from the relevant Hungarian statutory legal texts, above all, from two acts and from other laws and decrees. ...
متن کاملSemantic Similarity Measures for the Development of Thai Dialog System
Semantic similarity plays an important role in a number of applications including information extraction, information retrieval, document clustering and ontology learning. Most work has concentrated on English and other European languages. However, for the Thai language, there has been no research about word semantic similarity. This paper presents an experiment and benchmark data sets investig...
متن کاملWhy are wordnets important?
Wordnets are lexical databases in which words are organized into clusters based on their meanings, and they are linked to each other through different semantic and lexical relations. The first wordnet called the Princeton WordNet was created for English, which were followed by various wordnets created within the framework of the EuroWordNet and BalkaNet projects, among others. Here we focus on ...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کامل